Transcriptional Regulation
   HOME

TheInfoList



OR:

In
molecular biology Molecular biology is the branch of biology that seeks to understand the molecular basis of biological activity in and between cells, including biomolecular synthesis, modification, mechanisms, and interactions. The study of chemical and physi ...
and
genetics Genetics is the study of genes, genetic variation, and heredity in organisms.Hartl D, Jones E (2005) It is an important branch in biology because heredity is vital to organisms' evolution. Gregor Mendel, a Moravian Augustinian friar wor ...
, transcriptional regulation is the means by which a cell regulates the conversion of DNA to
RNA Ribonucleic acid (RNA) is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and deoxyribonucleic acid ( DNA) are nucleic acids. Along with lipids, proteins, and carbohydra ...
(
transcription Transcription refers to the process of converting sounds (voice, music etc.) into letters or musical notes, or producing a copy of something in another medium, including: Genetics * Transcription (biology), the copying of DNA into RNA, the fir ...
), thereby orchestrating gene activity. A single gene can be regulated in a range of ways, from altering the number of copies of RNA that are transcribed, to the temporal control of when the gene is transcribed. This control allows the cell or organism to respond to a variety of intra- and extracellular signals and thus mount a response. Some examples of this include producing the mRNA that encode enzymes to adapt to a change in a food source, producing the
gene In biology, the word gene (from , ; "...Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or ''birth'' or ''gender'') can have several different meanings. The Mendelian gene is a ba ...
products involved in cell cycle specific activities, and producing the gene products responsible for cellular differentiation in multicellular eukaryotes, as studied in evolutionary developmental biology. The regulation of transcription is a vital process in all living organisms. It is orchestrated by
transcription factors In molecular biology, a transcription factor (TF) (or sequence-specific DNA-binding factor) is a protein that controls the rate of transcription of genetic information from DNA to messenger RNA, by binding to a specific DNA sequence. The func ...
and other proteins working in concert to finely tune the amount of RNA being produced through a variety of mechanisms. Bacteria and
eukaryotes Eukaryotes () are organisms whose cells have a nucleus. All animals, plants, fungi, and many unicellular organisms, are Eukaryotes. They belong to the group of organisms Eukaryota or Eukarya, which is one of the three domains of life. Bacte ...
have very different strategies of accomplishing control over transcription, but some important features remain conserved between the two. Most importantly is the idea of combinatorial control, which is that any given gene is likely controlled by a specific combination of factors to control transcription. In a hypothetical example, the factors A and B might regulate a distinct set of genes from the combination of factors A and C. This combinatorial nature extends to complexes of far more than two proteins, and allows a very small subset (less than 10%) of the genome to control the transcriptional program of the entire cell.


In bacteria

Much of the early understanding of transcription came from bacteria, although the extent and complexity of transcriptional regulation is greater in eukaryotes. Bacterial transcription is governed by three main sequence elements: * Promoters are elements of DNA that may bind
RNA polymerase In molecular biology, RNA polymerase (abbreviated RNAP or RNApol), or more specifically DNA-directed/dependent RNA polymerase (DdRP), is an enzyme that synthesizes RNA from a DNA template. Using the enzyme helicase, RNAP locally opens the ...
and other proteins for the successful initiation of transcription directly upstream of the gene. *
Operators Operator may refer to: Mathematics * A symbol indicating a mathematical operation * Logical operator or logical connective in mathematical logic * Operator (mathematics), mapping that acts on elements of a space to produce elements of another sp ...
recognize repressor proteins that bind to a stretch of DNA and inhibit the transcription of the gene. * Positive control elements that bind to DNA and incite higher levels of transcription. While these means of transcriptional regulation also exist in eukaryotes, the transcriptional landscape is significantly more complicated both by the number of proteins involved as well as by the presence of
introns An intron is any nucleotide sequence within a gene that is not expressed or operative in the final RNA product. The word ''intron'' is derived from the term ''intragenic region'', i.e. a region inside a gene."The notion of the cistron .e., gene. ...
and the packaging of DNA into
histones In biology, histones are highly basic proteins abundant in lysine and arginine residues that are found in eukaryotic cell nuclei. They act as spools around which DNA winds to create structural units called nucleosomes. Nucleosomes in turn ar ...
. The transcription of a basic bacterial gene is dependent on the strength of its promoter and the presence of activators or repressors. In the absence of other regulatory elements, a promoter's sequence-based affinity for RNA polymerases varies, which results in the production of different amounts of transcript. The variable affinity of RNA polymerase for different promoter sequences is related to regions of consensus sequence upstream of the transcription start site. The more nucleotides of a promoter that agree with the consensus sequence, the stronger the affinity of the promoter for RNA Polymerase likely is. In the absence of other regulatory elements, the default state of a bacterial transcript is to be in the “on” configuration, resulting in the production of some amount of transcript. This means that transcriptional regulation in the form of protein repressors and positive control elements can either increase or decrease transcription. Repressors often physically occupy the promoter location, occluding RNA polymerase from binding. Alternatively a repressor and polymerase may bind to the DNA at the same time with a physical interaction between the repressor preventing the opening of the DNA for access to the minus strand for transcription. This strategy of control is distinct from eukaryotic transcription, whose basal state is to be off and where co-factors required for transcription initiation are highly gene dependent.
Sigma factor A sigma factor (σ factor or specificity factor) is a protein needed for initiation of transcription in bacteria. It is a bacterial transcription initiation factor that enables specific binding of RNA polymerase (RNAP) to gene promoters. It is ho ...
s are specialized bacterial proteins that bind to RNA polymerases and orchestrate transcription initiation. Sigma factors act as mediators of sequence-specific transcription, such that a single sigma factor can be used for transcription of all housekeeping genes or a suite of genes the cell wishes to express in response to some external stimuli such as stress. In addition to processes that regulate transcription at the stage of initiation, mRNA synthesis is also controlled by the rate of transcription elongation. RNA polymerase pauses occur frequently and are regulated by transcription factors, such as NusG and NusA,
transcription-translation coupling Transcription-translation coupling is a mechanism of regulation of gene expression, gene expression regulation in which synthesis of an mRNA (transcription (biology), transcription) is affected by its concurrent decoding (translation (biology), tra ...
, and mRNA secondary structure.


In eukaryotes

The added complexity of generating a eukaryotic cell carries with it an increase in the complexity of transcriptional regulation. Eukaryotes have three RNA polymerases, known as
Pol I DNA polymerase I (or Pol I) is an enzyme that participates in the process of prokaryotic DNA replication. Discovered by Arthur Kornberg in 1956, it was the first known DNA polymerase (and the first known of any kind of polymerase). It was initia ...
, Pol II, and
Pol III DNA polymerase III holoenzyme is the primary enzyme complex involved in prokaryotic DNA replication. It was discovered by Thomas Kornberg (son of Arthur Kornberg) and Malcolm Gefter in 1970. The complex has high processivity (i.e. the number of ...
. Each polymerase has specific targets and activities, and is regulated by independent mechanisms. There are a number of additional mechanisms through which polymerase activity can be controlled. These mechanisms can be generally grouped into three main areas: *Control over polymerase access to the gene. This is perhaps the broadest of the three control mechanisms. This includes the functions of
histone In biology, histones are highly basic proteins abundant in lysine and arginine residues that are found in eukaryotic cell nuclei. They act as spools around which DNA winds to create structural units called nucleosomes. Nucleosomes in turn a ...
remodeling enzymes, transcription factors, enhancers and repressors, and many other complexes *Productive elongation of the RNA transcript. Once polymerase is bound to a promoter, it requires another set of factors to allow it to escape the promoter complex and begin successfully transcribing RNA. *Termination of the polymerase. A number of factors which have been found to control how and when termination occurs, which will dictate the fate of the RNA transcript. All three of these systems work in concert to integrate signals from the cell and change the transcriptional program accordingly. While in prokaryotic systems the basal transcription state can be thought of as nonrestrictive (that is, “on” in the absence of modifying factors), eukaryotes have a restrictive basal state which requires the recruitment of other factors in order to generate RNA transcripts. This difference is largely due to the compaction of the eukaryotic genome by winding DNA around histones to form higher order structures. This compaction makes the gene promoter inaccessible without the assistance of other factors in the nucleus, and thus chromatin structure is a common site of regulation. Similar to the sigma factors in prokaryotes, the general transcription factors (GTFs) are a set of factors in eukaryotes that are required for all transcription events. These factors are responsible for stabilizing binding interactions and opening the DNA helix to allow the RNA polymerase to access the template, but generally lack specificity for different promoter sites. A large part of gene regulation occurs through transcription factors that either recruit or inhibit the binding of the general transcription machinery and/or the polymerase. This can be accomplished through close interactions with core promoter elements, or through the long distance enhancer elements. Once a polymerase is successfully bound to a DNA template, it often requires the assistance of other proteins in order to leave the stable promoter complex and begin elongating the nascent RNA strand. This process is called promoter escape, and is another step at which regulatory elements can act to accelerate or slow the transcription process. Similarly, protein and nucleic acid factors can associate with the elongation complex and modulate the rate at which the polymerase moves along the DNA template.


At the level of chromatin state

In eukaryotes, genomic DNA is highly compacted in order to be able to fit it into the nucleus. This is accomplished by winding the DNA around protein octamers called
histones In biology, histones are highly basic proteins abundant in lysine and arginine residues that are found in eukaryotic cell nuclei. They act as spools around which DNA winds to create structural units called nucleosomes. Nucleosomes in turn ar ...
, which has consequences for the physical accessibility of parts of the genome at any given time. Significant portions are silenced through histone modifications, and thus are inaccessible to the polymerases or their cofactors. The highest level of transcription regulation occurs through the rearrangement of histones in order to expose or sequester genes, because these processes have the ability to render entire regions of a chromosome inaccessible such as what occurs in imprinting. Histone rearrangement is facilitated by
post-translational modifications Post-translational modification (PTM) is the covalent and generally enzymatic modification of proteins following protein biosynthesis. This process occurs in the endoplasmic reticulum and the golgi apparatus. Proteins are synthesized by ribosomes ...
to the tails of the core histones. A wide variety of modifications can be made by enzymes such as the histone acetyltransferases (HATs), histone methyltransferases (HMTs), and histone deacetylases (HDACs), among others. These enzymes can add or remove covalent modifications such as methyl groups, acetyl groups, phosphates, and ubiquitin. Histone modifications serve to recruit other proteins which can either increase the compaction of the chromatin and sequester promoter elements, or to increase the spacing between histones and allow the association of transcription factors or polymerase on open DNA. For example, H3K27 trimethylation by the polycomb complex PRC2 causes chromosomal compaction and gene silencing. These histone modifications may be created by the cell, or inherited in an
epigenetic In biology, epigenetics is the study of stable phenotypic changes (known as ''marks'') that do not involve alterations in the DNA sequence. The Greek prefix '' epi-'' ( "over, outside of, around") in ''epigenetics'' implies features that are "o ...
fashion from a parent.


At the level of cytosine methylation

Transcription regulation at about 60% of promoters is controlled by methylation of cytosines within CpG dinucleotides (where 5’ cytosine is followed by 3’ guanine or
CpG sites The CpG sites or CG sites are regions of DNA where a cytosine nucleotide is followed by a guanine nucleotide in the linear sequence of bases along its 5' → 3' direction. CpG sites occur with high frequency in genomic regions called CpG isl ...
). 5-methylcytosine (5-mC) is a
methylated In the chemical sciences, methylation denotes the addition of a methyl group on a substrate, or the substitution of an atom (or group) by a methyl group. Methylation is a form of alkylation, with a methyl group replacing a hydrogen atom. These ...
form of the DNA base
cytosine Cytosine () ( symbol C or Cyt) is one of the four nucleobases found in DNA and RNA, along with adenine, guanine, and thymine (uracil in RNA). It is a pyrimidine derivative, with a heterocyclic aromatic ring and two substituents attached (an am ...
(see Figure). 5-mC is an
epigenetic In biology, epigenetics is the study of stable phenotypic changes (known as ''marks'') that do not involve alterations in the DNA sequence. The Greek prefix '' epi-'' ( "over, outside of, around") in ''epigenetics'' implies features that are "o ...
marker found predominantly within CpG sites. About 28 million CpG dinucleotides occur in the human genome. In most tissues of mammals, on average, 70% to 80% of CpG cytosines are methylated (forming 5-methylCpG or 5-mCpG). Methylated cytosines within 5’cytosine-guanine 3’ sequences often occur in groups, called
CpG islands The CpG sites or CG sites are regions of DNA where a cytosine nucleotide is followed by a guanine nucleotide in the linear sequence of bases along its 5' → 3' direction. CpG sites occur with high frequency in genomic regions called CpG isl ...
. About 60% of promoter sequences have a CpG island while only about 6% of enhancer sequences have a CpG island. CpG islands constitute regulatory sequences, since if CpG islands are methylated in the promoter of a gene this can reduce or silence gene transcription. DNA methylation regulates gene transcription through interaction with methyl binding domain (MBD) proteins, such as
MeCP2 ''MECP2'' (methyl CpG binding protein 2) is a gene that encodes the protein MECP2. MECP2 appears to be essential for the normal function of nerve cells. The protein seems to be particularly important for mature nerve cells, where it is present in ...
,
MBD1 Methyl-CpG-binding domain protein 1 is a protein that in humans is encoded by the ''MBD1'' gene. The protein encoded by MBD1 binds to methylated sequences in DNA, and thereby influences transcription. It binds to a variety of methylated sequence ...
and
MBD2 Methyl-CpG-binding domain protein 2 is a protein that in humans is encoded by the ''MBD2'' gene. Function DNA methylation is the major modification of eukaryotic genomes and plays an essential role in mammalian development. Human proteins MECP ...
. These MBD proteins bind most strongly to highly methylated
CpG islands The CpG sites or CG sites are regions of DNA where a cytosine nucleotide is followed by a guanine nucleotide in the linear sequence of bases along its 5' → 3' direction. CpG sites occur with high frequency in genomic regions called CpG isl ...
. These MBD proteins have both a methyl-CpG-binding domain as well as a transcription repression domain. They bind to methylated DNA and guide or direct protein complexes with chromatin remodeling and/or histone modifying activity to methylated CpG islands. MBD proteins generally repress local chromatin such as by catalyzing the introduction of repressive histone marks, or creating an overall repressive chromatin environment through nucleosome remodeling and chromatin reorganization.
Transcription factors In molecular biology, a transcription factor (TF) (or sequence-specific DNA-binding factor) is a protein that controls the rate of transcription of genetic information from DNA to messenger RNA, by binding to a specific DNA sequence. The func ...
are proteins that bind to specific DNA sequences in order to regulate the expression of a gene. The binding sequence for a transcription factor in DNA is usually about 10 or 11 nucleotides long. As summarized in 2009, Vaquerizas et al. indicated there are approximately 1,400 different transcription factors encoded in the human genome by genes that constitute about 6% of all human protein encoding genes. About 94% of transcription factor binding sites (TFBSs) that are associated with signal-responsive genes occur in enhancers while only about 6% of such TFBSs occur in promoters.
EGR1 EGR-1 (Early growth response protein 1) also known as ZNF268 (zinc finger protein 268) or NGFI-A (nerve growth factor-induced protein A) is a protein that in humans is encoded by the ''EGR1'' gene. EGR-1 is a mammalian transcription factor. It wa ...
protein is a particular transcription factor that is important for regulation of methylation of CpG islands. An
EGR1 EGR-1 (Early growth response protein 1) also known as ZNF268 (zinc finger protein 268) or NGFI-A (nerve growth factor-induced protein A) is a protein that in humans is encoded by the ''EGR1'' gene. EGR-1 is a mammalian transcription factor. It wa ...
transcription factor binding site is frequently located in enhancer or promoter sequences. There are about 12,000 binding sites for EGR1 in the mammalian genome and about half of EGR1 binding sites are located in promoters and half in enhancers. The binding of EGR1 to its target DNA binding site is insensitive to cytosine methylation in the DNA. While only small amounts of EGR1 transcription factor protein are detectable in cells that are un-stimulated, translation of the ''EGR1'' gene into protein at one hour after stimulation is drastically elevated. Expression of EGR1 transcription factor proteins, in various types of cells, can be stimulated by growth factors, neurotransmitters, hormones, stress and injury. In the brain, when neurons are activated, EGR1 proteins are up-regulated and they bind to (recruit) the pre-existing TET1 enzymes which are highly expressed in neurons.
TET enzymes The TET enzymes are a family of ten-eleven translocation (TET) methylcytosine dioxygenases. They are instrumental in DNA demethylation. 5-Methylcytosine (see first Figure) is a methylated form of the DNA base cytosine (C) that often regulates ge ...
can catalyse demethylation of 5-methylcytosine. When EGR1 transcription factors bring TET1 enzymes to EGR1 binding sites in promoters, the TET enzymes can
demethylate Demethylating agents are chemical substances that can inhibit methylation, resulting in the expression of the previously hypermethylated silenced genes (see Methylation#Cancer for more detail). Cytidine analogs such as 5-azacytidine (azacitidine) ...
the methylated CpG islands at those promoters. Upon demethylation, these promoters can then initiate transcription of their target genes. Hundreds of genes in neurons are differentially expressed after neuron activation through EGR1 recruitment of TET1 to methylated regulatory sequences in their promoters. The methylation of promoters is also altered in response to signals. The three mammalian DNA methyltransferasess (DNMT1, DNMT3A, and DNMT3B) catalyze the addition of methyl groups to cytosines in DNA. While DNMT1 is a “maintenance” methyltransferase, DNMT3A and DNMT3B can carry out new methylations. There are also two splice
protein isoform A protein isoform, or "protein variant", is a member of a set of highly similar proteins that originate from a single gene or gene family and are the result of genetic differences. While many perform the same or similar biological roles, some isof ...
s produced from the ''DNMT3A'' gene: DNA methyltransferase proteins DNMT3A1 and DNMT3A2. The splice isoform DNMT3A2 behaves like the product of a classical immediate-early gene and, for instance, it is robustly and transiently produced after neuronal activation. Where the DNA methyltransferase isoform DNMT3A2 binds and adds methyl groups to cytosines appears to be determined by histone post translational modifications. On the other hand, neural activation causes degradation of DNMT3A1 accompanied by reduced methylation of at least one evaluated targeted promoter.


Through transcription factors and enhancers


Transcription factors

Transcription factors In molecular biology, a transcription factor (TF) (or sequence-specific DNA-binding factor) is a protein that controls the rate of transcription of genetic information from DNA to messenger RNA, by binding to a specific DNA sequence. The func ...
are proteins that bind to specific DNA sequences in order to regulate the expression of a given gene. There are approximately 1,400 transcription factors in the human genome and they constitute about 6% of all human protein coding genes. The power of transcription factors resides in their ability to activate and/or repress wide repertoires of downstream target genes. The fact that these transcription factors work in a combinatorial fashion means that only a small subset of an organism's genome encodes transcription factors. Transcription factors function through a wide variety of mechanisms. In one mechanism, CpG methylation influences binding of most transcription factors to DNA—in some cases negatively and in others positively. In addition, often they are at the end of a
signal transduction Signal transduction is the process by which a chemical or physical signal is transmitted through a cell as a series of molecular events, most commonly protein phosphorylation catalyzed by protein kinases, which ultimately results in a cellula ...
pathway that functions to change something about the factor, like its subcellular localization or its activity. Post-translational modifications to transcription factors located in the
cytosol The cytosol, also known as cytoplasmic matrix or groundplasm, is one of the liquids found inside cells (intracellular fluid (ICF)). It is separated into compartments by membranes. For example, the mitochondrial matrix separates the mitochondri ...
can cause them to translocate to the
nucleus Nucleus ( : nuclei) is a Latin word for the seed inside a fruit. It most often refers to: *Atomic nucleus, the very dense central region of an atom *Cell nucleus, a central organelle of a eukaryotic cell, containing most of the cell's DNA Nucle ...
where they can interact with their corresponding enhancers. Other transcription factors are already in the nucleus, and are modified to enable the interaction with partner transcription factors. Some post-translational modifications known to regulate the functional state of transcription factors are
phosphorylation In chemistry, phosphorylation is the attachment of a phosphate group to a molecule or an ion. This process and its inverse, dephosphorylation, are common in biology and could be driven by natural selection. Text was copied from this source, wh ...
,
acetylation : In organic chemistry, acetylation is an organic esterification reaction with acetic acid. It introduces an acetyl group into a chemical compound. Such compounds are termed ''acetate esters'' or simply '' acetates''. Deacetylation is the oppo ...
,
SUMOylation In molecular biology, SUMO (Small Ubiquitin-like Modifier) proteins are a family of small proteins that are covalently attached to and detached from other proteins in cells to modify their function. This process is called SUMOylation (sometimes w ...
and
ubiquitylation Ubiquitin is a small (8.6 kDa) regulatory protein found in most tissues of eukaryotic organisms, i.e., it is found ''ubiquitously''. It was discovered in 1975 by Gideon Goldstein and further characterized throughout the late 1970s and 1980s. Fou ...
. Transcription factors can be divided in two main categories: activators and repressors. While activators can interact directly or indirectly with the core machinery of transcription through enhancer binding, repressors predominantly recruit co-repressor complexes leading to transcriptional repression by chromatin condensation of enhancer regions. It may also happen that a repressor may function by allosteric competition against a determined activator to repress gene expression: overlapping DNA-binding motifs for both activators and repressors induce a physical competition to occupy the site of binding. If the repressor has a higher affinity for its motif than the activator, transcription would be effectively blocked in the presence of the repressor. Tight regulatory control is achieved by the highly dynamic nature of transcription factors. Again, many different mechanisms exist to control whether a transcription factor is active. These mechanisms include control over protein localization or control over whether the protein can bind DNA. An example of this is the protein
HSF1 Heat shock factor 1 (HSF1) is a protein that in humans is encoded by the ''HSF1'' gene. HSF1 is highly conserved in eukaryotes and is the primary mediator of transcriptional responses to proteotoxic stress with important roles in non-stress regul ...
, which remains bound to
Hsp70 The 70 kilodalton heat shock proteins (Hsp70s or DnaK) are a family of conserved ubiquitously expressed heat shock proteins. Proteins with similar structure exist in virtually all living organisms. Intracellularly localized Hsp70s are an importa ...
in the cytosol and is only translocated into the nucleus upon cellular stress such as heat shock. Thus the genes under the control of this transcription factor will remain untranscribed unless the cell is subjected to stress.


Enhancers

Enhancers or cis-regulatory modules/elements (CRM/CRE) are non-coding DNA sequences containing multiple activator and repressor binding sites. Enhancers range from 200 bp to 1 kb in length and can be either proximal, 5’ upstream to the promoter or within the first intron of the regulated gene, or distal, in introns of neighboring genes or intergenic regions far away from the locus. Through DNA looping, active enhancers contact the promoter dependently of the core DNA binding motif promoter specificity. Promoter-enhancer dichotomy provides the basis for the functional interaction between transcription factors and transcriptional core machinery to trigger RNA Pol II escape from the promoter. Whereas one could think that there is a 1:1 enhancer-promoter ratio, studies of the human genome predict that an active promoter interacts with 4 to 5 enhancers. Similarly, enhancers can regulate more than one gene without linkage restriction and are said to “skip” neighboring genes to regulate more distant ones. Even though infrequent, transcriptional regulation can involve elements located in a chromosome different from one where the promoter resides. Proximal enhancers or promoters of neighboring genes can serve as platforms to recruit more distal elements.


Enhancer activation and implementation

Up-regulated expression of genes in mammals can be initiated when signals are transmitted to the promoters associated with the genes. Cis-regulatory DNA sequences that are located in DNA regions distant from the promoters of genes can have very large effects on gene expression, with some genes undergoing up to 100-fold increased expression due to such a cis-regulatory sequence. These cis-regulatory sequences include
enhancers In genetics, an enhancer is a short (50–1500 bp) region of DNA that can be bound by proteins ( activators) to increase the likelihood that transcription of a particular gene will occur. These proteins are usually referred to as transcriptio ...
, silencers,
insulators Insulator may refer to: * Insulator (electricity), a substance that resists electricity ** Pin insulator, a device that isolates a wire from a physical support such as a pin on a utility pole ** Strain insulator, a device that is designed to work ...
and tethering elements. Among this constellation of sequences, enhancers and their associated transcription factor proteins have a leading role in the regulation of gene expression.
Enhancers In genetics, an enhancer is a short (50–1500 bp) region of DNA that can be bound by proteins ( activators) to increase the likelihood that transcription of a particular gene will occur. These proteins are usually referred to as transcriptio ...
are sequences of the genome that are major gene-regulatory elements. Enhancers control cell-type-specific gene expression programs, most often by looping through long distances to come in physical proximity with the promoters of their target genes. In a study of brain cortical neurons, 24,937 loops were found, bringing enhancers to promoters. Multiple enhancers, each often at tens or hundred of thousands of nucleotides distant from their target genes, loop to their target gene promoters and coordinate with each other to control expression of their common target gene. The schematic illustration in this section shows an enhancer looping around to come into close physical proximity with the promoter of a target gene. The loop is stabilized by a dimer of a connector protein (e.g. dimer of
CTCF Transcriptional repressor CTCF also known as 11-zinc finger protein or CCCTC-binding factor is a transcription factor that in humans is encoded by the ''CTCF'' gene. CTCF is involved in many cellular processes, including transcriptional regulatio ...
or
YY1 YY1 (Yin Yang 1) is a transcriptional repressor protein in humans that is encoded by the YY1 gene. Function YY1 is a ubiquitously distributed transcription factor belonging to the GLI-Kruppel class of zinc finger proteins. The protein is invo ...
), with one member of the dimer anchored to its binding motif on the enhancer and the other member anchored to its binding motif on the promoter (represented by the red zigzags in the illustration). Several cell function specific transcription factor proteins (in 2018 Lambert et al. indicated there were about 1,600 transcription factors in a human cell) generally bind to specific motifs on an enhancer and a small combination of these enhancer-bound transcription factors, when brought close to a promoter by a DNA loop, govern the level of transcription of the target gene.
Mediator (coactivator) Mediator is a multiprotein complex that functions as a transcriptional coactivator in all eukaryotes. It was discovered in 1990 in the lab of Roger D. Kornberg, recipient of the 2006 Nobel Prize in Chemistry. Mediator complexes interact with tra ...
(a complex usually consisting of about 26 proteins in an interacting structure) communicates regulatory signals from enhancer DNA-bound transcription factors directly to the RNA polymerase II (RNAP II) enzyme bound to the promoter. Enhancers, when active, are generally transcribed from both strands of DNA with RNA polymerases acting in two different directions, producing two eRNAs as illustrated in the Figure. An inactive enhancer may be bound by an inactive transcription factor. Phosphorylation of the transcription factor may activate it and that activated transcription factor may then activate the enhancer to which it is bound (see small red star representing phosphorylation of a transcription factor bound to an enhancer in the illustration). An activated enhancer begins transcription of its RNA before activating a promoter to initiate transcription of messenger RNA from its target gene.


Regulatory landscape

Transcriptional initiation, termination and regulation are mediated by “DNA looping” which brings together promoters, enhancers, transcription factors and RNA processing factors to accurately regulate gene expression. Chromosome conformation capture (3C) and more recently Hi-C techniques provided evidence that active chromatin regions are “compacted” in nuclear domains or bodies where transcriptional regulation is enhanced. The configuration of the genome is essential for enhancer-promoter proximity. Cell-fate decisions are mediated upon highly dynamic genomic reorganizations at interphase to modularly switch on or off entire gene regulatory networks through short to long range chromatin rearrangements. Related studies demonstrate that metazoan genomes are partitioned in structural and functional units around a megabase long called
Topological association domain A topologically associating domain (TAD) is a self-interacting genomic region, meaning that DNA sequences within a TAD physically interact with each other more frequently than with sequences outside the TAD. The median size of a TAD in mouse cell ...
s (TADs) containing dozens of genes regulated by hundreds of enhancers distributed within large genomic regions containing only non-coding sequences. The function of TADs is to regroup enhancers and promoters interacting together within a single large functional domain instead of having them spread in different TADs. However, studies of mouse development point out that two adjacent TADs may regulate the same gene cluster. The most relevant study on limb evolution shows that the TAD at the 5’ of the HoxD gene cluster in tetrapod genomes drives its expression in the distal limb bud embryos, giving rise to the hand, while the one located at 3’ side does it in the proximal limb bud, giving rise to the arm. Still, it is not known whether TADs are an adaptive strategy to enhance regulatory interactions or an effect of the constrains on these same interactions. TAD boundaries are often composed by housekeeping genes, tRNAs, other highly expressed sequences and Short Interspersed Elements (SINE). While these genes may take advantage of their border position to be ubiquitously expressed, they are not directly linked with TAD edge formation. The specific molecules identified at boundaries of TADs are called insulators or architectural proteins because they not only block enhancer leaky expression but also ensure an accurate compartmentalization of cis-regulatory inputs to the targeted promoter. These
insulators Insulator may refer to: * Insulator (electricity), a substance that resists electricity ** Pin insulator, a device that isolates a wire from a physical support such as a pin on a utility pole ** Strain insulator, a device that is designed to work ...
are DNA-binding proteins like CTCF and TFIIIC that help recruiting structural partners such as cohesins and condensins. The localization and binding of architectural proteins to their corresponding binding sites is regulated by post-translational modifications. DNA binding motifs recognized by architectural proteins are either of high occupancy and at around a megabase of each other or of low occupancy and inside TADs. High occupancy sites are usually conserved and static while intra-TADs sites are dynamic according to the state of the cell therefore TADs themselves are compartmentalized in subdomains that can be called subTADs from few kb up to a TAD long (19). When architectural binding sites are at less than 100 kb from each other, Mediator proteins are the architectural proteins cooperate with cohesin. For subTADs larger than 100 kb and TAD boundaries, CTCF is the typical insulator found to interact with cohesion.


Of the pre-initiation complex and promoter escape

In eukaryotes, ribosomal rRNA and the
tRNAs Transfer RNA (abbreviated tRNA and formerly referred to as sRNA, for soluble RNA) is an adaptor molecule composed of RNA, typically 76 to 90 nucleotides in length (in eukaryotes), that serves as the physical link between the mRNA and the amino a ...
involved in translation are controlled by
RNA polymerase I RNA polymerase 1 (also known as Pol I) is, in higher eukaryotes, the polymerase that only transcribes ribosomal RNA (but not 5S rRNA, which is synthesized by RNA polymerase III), a type of RNA that accounts for over 50% of the total RNA synthesize ...
(Pol I) and
RNA polymerase III In eukaryote cells, RNA polymerase III (also called Pol III) is a protein that transcribes DNA to synthesize ribosomal 5S rRNA, tRNA and other small RNAs. The genes transcribed by RNA Pol III fall in the category of "housekeeping" genes whose e ...
(Pol III) .
RNA Polymerase II RNA polymerase II (RNAP II and Pol II) is a multiprotein complex that transcribes DNA into precursors of messenger RNA (mRNA) and most small nuclear RNA (snRNA) and microRNA. It is one of the three RNAP enzymes found in the nucleus of eukaryoti ...
(Pol II) is responsible for the production of
messenger RNA In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of synthesizing a protein. mRNA is created during the p ...
(mRNA) within the cell. Particularly for Pol II, much of the regulatory checkpoints in the transcription process occur in the assembly and escape of the pre-initiation complex. A gene-specific combination of transcription factors will recruit
TFIID Transcription factor II D (TFIID) is one of several general transcription factors that make up the RNA polymerase II preinitiation complex. RNA polymerase II holoenzyme is a form of eukaryotic RNA polymerase II that is recruited to the promoters o ...
and/or
TFIIA Transcription factor TFIIA is a nuclear protein involved in the RNA polymerase II-dependent transcription of DNA. TFIIA is one of several general (basal) transcription factors ( GTFs) that are required for all transcription events that use RNA ...
to the core promoter, followed by the association of
TFIIB Transcription factor II B (TFIIB) is a general transcription factor that is involved in the formation of the RNA polymerase II preinitiation complex (PIC) and aids in stimulating transcription initiation. TFIIB is localised to the nucleus and pr ...
, creating a stable complex onto which the rest of the General Transcription Factors (GTFs) can assemble. This complex is relatively stable, and can undergo multiple rounds of transcription initiation. After the binding of TFIIB and TFIID, Pol II the rest of the GTFs can assemble. This assembly is marked by the post-translational modification (typically phosphorylation) of the C-terminal domain (CTD) of Pol II through a number of kinases. The CTD is a large, unstructured domain extending from the RbpI subunit of Pol II, and consists of many repeats of the heptad sequence YSPTSPS.
TFIIH Transcription factor II Human (transcription factor II H; TFIIH) is an important protein complex, having roles in transcription of various protein-coding genes and DNA nucleotide excision repair (NER) pathways. TFIIH first came to light in 1989 ...
, the helicase that remains associated with Pol II throughout transcription, also contains a subunit with kinase activity which will phosphorylate the serines 5 in the heptad sequence. Similarly, both
CDK8 Cell division protein kinase 8 is an enzyme that in humans is encoded by the ''CDK8'' gene. Function The protein encoded by this gene is a member of the cyclin-dependent protein kinase (CDK) family. CDK8 and cyclin C associate with the mediato ...
(a subunit of the massive multiprotein Mediator complex) and
CDK9 Cyclin-dependent kinase 9 or CDK9 is a cyclin-dependent kinase associated with P-TEFb. Function The protein encoded by this gene is a member of the cyclin-dependent kinase (CDK) family. CDK family members are highly similar to the gene produ ...
(a subunit of the
p-TEFb The positive transcription elongation factor, P-TEFb, is a multiprotein complex that plays an essential role in the regulation of transcription by RNA polymerase II (Pol II) in eukaryotes. Immediately following initiation Pol II becomes trapped in ...
elongation factor), have kinase activity towards other residues on the CTD. These phosphorylation events promote the transcription process and serve as sites of recruitment for mRNA processing machinery. All three of these kinases respond to upstream signals, and failure to phosphorylate the CTD can lead to a stalled polymerase at the promoter.


In cancer

In vertebrates, the majority of gene promoters contain a
CpG island The CpG sites or CG sites are regions of DNA where a cytosine nucleotide is followed by a guanine nucleotide in the linear sequence of bases along its 5' → 3' direction. CpG sites occur with high frequency in genomic regions called CpG isl ...
with numerous
CpG site The CpG sites or CG sites are regions of DNA where a cytosine nucleotide is followed by a guanine nucleotide in the linear sequence of bases along its 5' → 3' direction. CpG sites occur with high frequency in genomic regions called CpG isl ...
s. When many of a gene's promoter CpG sites are
methylated In the chemical sciences, methylation denotes the addition of a methyl group on a substrate, or the substitution of an atom (or group) by a methyl group. Methylation is a form of alkylation, with a methyl group replacing a hydrogen atom. These ...
the gene becomes silenced. Colorectal cancers typically have 3 to 6 driver mutations and 33 to 66
hitchhiker Hitchhiking (also known as thumbing, autostop or hitching) is a means of transportation that is gained by asking individuals, usually strangers, for a ride in their car or other vehicle. The ride is usually, but not always, free. Nomads hav ...
or passenger mutations. However, transcriptional silencing may be of more importance than mutation in causing progression to cancer. For example, in colorectal cancers about 600 to 800 genes are transcriptionally silenced by CpG island methylation (see
regulation of transcription in cancer Generally, in progression to cancer, hundreds of genes are silenced or activated. Although silencing of some genes in cancers occurs by mutation, a large proportion of carcinogenic gene silencing is a result of altered DNA methylation (see DNA meth ...
). Transcriptional repression in cancer can also occur by other
epigenetic In biology, epigenetics is the study of stable phenotypic changes (known as ''marks'') that do not involve alterations in the DNA sequence. The Greek prefix '' epi-'' ( "over, outside of, around") in ''epigenetics'' implies features that are "o ...
mechanisms, such as altered expression of
microRNAs MicroRNA (miRNA) are small, single-stranded, non-coding RNA molecules containing 21 to 23 nucleotides. Found in plants, animals and some viruses, miRNAs are involved in RNA silencing and post-transcriptional regulation of gene expression. miR ...
. In breast cancer, transcriptional repression of
BRCA1 Breast cancer type 1 susceptibility protein is a protein that in humans is encoded by the ''BRCA1'' () gene. Orthologs are common in other vertebrate species, whereas invertebrate genomes may encode a more distantly related gene. ''BRCA1'' is a h ...
may occur more frequently by over-expressed microRNA-182 than by hypermethylation of the BRCA1 promoter (see Low expression of BRCA1 in breast and ovarian cancers).


References

{{reflist, 30em


External links


Plant Transcription Factor Database and Plant Transcriptional Regulation Data and Analysis Platform

MIT : Activating a new understanding of gene regulation
Gene expression